Probabilistic Databases ∗ Dan

نویسنده

  • Dan Suciu
چکیده

Many applications today need to manage large data sets with uncertainties. In this paper we describe the foundations of managing data where the uncertainties are quantified as probabilities. We review the basic definitions of the probabilistic data model and present some fundamental theoretical results for query evaluation on probabilistic databases. 1 The Quest for Probabilistic Databases Commercial databases today are deterministic. Relational databases are rooted in First Order Logic and Finite Model Theory, as initially envisioned by Codd [10], and were originally motivated by applications like banking, payroll, accounting, inventory, all of which require a precise semantics of the data. Subsequently, databases and data management techniques have been extended to handle richer data models, such as Nested Relations [44], Object-relational data [21], temporal data [45], spatial data [36], and semistructured data and XML [46]. All these extensions rely on a deterministic semantics for both data and queries. Today, the database community needs to manage large volumes of data that is imprecise, or uncertain, and that contains an explicit representation of the uncertainty. Uncertain data occurs in large scale data integration [29], integration of life-science databases [39], in information extraction systems [24, 22, 33], in sensor data [8, 23, 19, 20], in activity recognition data [37, 9, 35]. Modeling and managing data where uncertainties are explicitly represented and numerically quantified requires a new paradigm whose foundation are based on probability theory, probabilistic inference [12], probability logic [2], and degrees of belief [6], in addition to Finite Model Theory. This paper presents the basic definitions of the probabilistic data model and some fundamental theoretical results for query evaluation on probabilistic databases. A precursor to probabilistic databases, ∗This research was supported in part by Suciu’s NSF CAREER grant IIS-0092955, NSF grants IIS-0415193, IIS-0627585, IIS-0513877, IIS-0428168, and a gift from Microsoft.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Probabilistic Views for Large-Scale Statistical Inference

Probabilistic databases extend statistical inference from limited, hand-crafted statistical models to an entire database. Data analysts can discover trends, test hypothesis, and run what-if scenarios by simply running SQL queries. The technical challenge in a probabilistic database is the query processor, which needs to perform a probabilistic inference for every row output by a SQL query: the ...

متن کامل

Query Processing on Probabilistic Data: A Survey

Probabilistic data is motivated by the need to model uncertainty in large databases. Over the last twenty years or so, both the Database community and the AI community have studied various aspects of probabilistic relational data. This survey presents the main approaches developed in the literature, reconciling concepts developed in parallel by the two research communities. The survey starts wi...

متن کامل

Factorized Databases: Past and Future Past

In this talk I will overview the FDB project at Oxford on succinct, lossless representations of relational data that I call factorized databases. I will first present a characterization of the succinctness of results to conjunctive queries and how factorizations can speed up query processing.I will then comment on how this succinctness characterization relates to seemingly disparate results on:...

متن کامل

MayBMS: A System for Managing Large Uncertain and Probabilistic Databases

MayBMS is a state-of-the-art probabilistic database management system that has been built as an extension of Postgres, an open-source relational database management system. MayBMS follows a principled approach to leveraging the strengths of previous database research for achieving scalability. This article describes the main goals of this project, the design of query and update language, effici...

متن کامل

Lifted Inference in Probabilistic Databases

Probabilistic Databases (PDBs) extend traditional relational databases by annotating each record with a weight, or a probability. Although PDBs define a very simple probability space, by simply adding constraints one can model much richer probability spaces, such as those represented by Markov Logic Networks or other Statistical Relational Models. While in traditional databases query evaluation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008